*TESTING FOR ASSOCIATIONS WITH CONTINUOUS DATA *open previous do file or start a new do file *load presidents.dta from the Examples folder in the 17.871 course locker *use "C:\gl\current\lab class\Examples\presidents.dta", clear list sum *SCATTER PLOT scatter incvote gnp4, scheme(sj) scatter incvote rdi, scheme(sj) *compare the same elections drop if rdi==. *(or scatter incvote gnp4 if rdi!=., scheme(sj)) *add regression line (helps understand correlation) scatter incvote gnp4 || lfit incvote gnp4 , scheme(sj) scatter incvote rdi || lfit incvote rdi, scheme(sj) *Improving scatter plots *drop the legend scatter incvote gnp4 || lfit incvote gnp4 , scheme(sj) legend(off) *label the y-axis scatter incvote gnp4 || lfit incvote gnp4 , scheme(sj) legend(off) ytitle("Incumbent vote %") *label the points scatter incvote gnp4 , mlabel(year) || lfit incvote gnp4 , scheme(sj) legend(off) ytitle("Incumbent vote %") scatter incvote rdi, mlabel(year) || lfit incvote rdi, scheme(sj) legend(off) ytitle("Incumbent vote %") *Which is more (Pearson) correlated with vote choice? corr incvote gnp4 rdi, cov corr incvote gnp4 rdi *REGRESSION *compare with scatter plot reg incvote rdi *look at gnp4 scatter incvote gnp4 , mlabel(year) || lfit incvote gnp4 , scheme(sj) legend(off) ytitle("Incumbent vote %") reg incvote gnp4 *WHY PREFER REGRESSION TO CORRELATION scatter incvote rdi, mlabel(year) || lfit incvote rdi, scheme(sj) legend(off) ytitle("Incumbent vote %") corr incvote rdi reg incvote rdi *reduce the range of rdi corr incvote rdi if rdi >2 & rdi <5 reg incvote rdi if rdi >2 & rdi <5 *correlation changes dramatically, regression coefficient doesn't scatter incvote rdi if rdi >2 & rdi <5, mlabel(year) msymbol(i) || lfit incvote rdi if rdi >2 & rdi <5, scheme(sj) legend(off) ytitle("Incumbent vote %") *TIMESERIES FIGURES line incvote year line incvote year || line gnp4 year line incvote year, yaxis(1) || line gnp4 year, yaxis(2) label variable incvote "Incumbent vote %" line incvote year, yaxis(1) || line rdi year, yaxis(2) *TESTING BIVARIATE RELATIONSHIPS WITH DISCRETE DATA *reload presidents.dta from the Examples folder in the 17.871 course locker *use "C:\gl\current\lab class\Examples\presidents.dta", clear *continuous dv, but discrete tab incterms *recode 5 to 4 or more recode incterms 5 = 4 table incterms, c(mean incvote) table incterms, c(mean incvote freq) *Always better to see a figure scatter incvote incterms, mlabel(year) *Alternatives: Box plots graph box incvote, scheme(sj) over(incterms) *Improve figure graph box incvote, scheme(sj) over(incterms,relabel(1 "1 Term" 2 "2 Terms" 3 "3 Terms" 4 "4 or more Terms")) ytitle("Incumbent vote %") *Timeseries plot (not always appropriate) line incvote year, yaxis(1) || line incterms year, yaxis(2) line incvote year || line incterms year *Check with regression reg incvote incterms reg incvote incterms gnp4 *Another approach to graphing continuous DV and discrete explanatory variables collapse incvote year, by(incterms) list scatter incvote incterms, scheme(sj) label variable incterms "Incumbent terms" scatter incvote incterms || lfit incvote incterms, scheme(sj) legend(off) ytitle("Incumbent vote %")